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ABSTRACT 

As structured documents with rich metadata (such as prod¬ 
ucts, movies, etc.) become increasingly prevalent, search¬ 
ing those documents has become an important IR problem. 
Although advanced search interfaces are widely available, 
most users still prefer to use keyword-based queries to search 
those documents. Query keywords often imply some hidden 
restrictions on the desired documents, which can be repre¬ 
sented as document facet-value pairs. To achieve high re¬ 
trieval performance, it’s important to be able to identify the 
relevant facet-value pairs hidden in a query. In this paper, 
we study the problem of identifying document facet-value 
pairs that are relevant to a keyword-based search query. We 
propose a machine learning approach and a set of useful 
features, and evaluate our approach using a movie data set 
from INEX. 
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Search and Retrieval 
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1. INTRODUCTION 

Structured documents with rich metadata are increasingly 
prevalent on the Internet. A large portion of structured doc¬ 
uments are those representing various types of entities, such 
as products, movies, images, businesses, jobs, people, etc. In 
these documents, each metadata field characterizes a specific 
facet of the entity, and may be assigned with one or several 
values. In this paper, we call each metadata field a doc¬ 
ument facet, a metadata field assigned with a particular 
value a facet-value pair (FVP). Figure [T] shows a movie 
document, where the bold words are facets, each of which 
is followed by the value(s) of the facet. In this document, 
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Figure 1: An example of structured document that 
describes a movie 


the facet “genre” has three values: “Action”, “Adventure”, 
and “Sci-Fi”, which correspond to three facet-value pairs re¬ 
spectively: “genre: Action”, “genre: Adventure”, and “genre: 
Sci-Fi”. 

A keyword-based query searching for structured docu¬ 
ments usually implies a set of facet-value pairs that jointly 
define the information need behind the query. For example, 
query “15-inch silver laptop by Lenovo” implies the relevant 
products must meet the following restrictions: “product cat¬ 
egory: laptop”, “maker: Lenovo”, “color: silver”, “screen size: 
15-inch”. If the system can identify the relevant facet-value 
pairs behind a query, the system will be able to return more 
accurate documents to the user. 

In this paper, we study the problem of identifying rel¬ 
evant facet-value pairs from query keywords. The central 
question is: given a keyword query, how to find the rele¬ 
vant facet-value pairs that the retrieved documents should 
have? To answer this question, we propose a learning-based 
approach and a set of features, and evaluate our approach 
on a movie data set from INEX. Experiment results show 
that the learning-based approach can effectively find rele¬ 
vant facet-value pairs and the proposed features are very 
useful. 

In our previous work [B], we proposed several heuristic ap¬ 
proaches for ranking facet-value pairs based on relevance to 
the query. In this paper, our approach is different in two as¬ 
pects. First, we focus on highly structured documents where 
metadata dominates each document. Secondly, we use a 
learning-based approach for FVP ranking. Compared with 
heuristic approaches, learning-based approaches usually per¬ 
form better because of being able to combine multiple rele¬ 
vance signals. In fact, some of the features we propose in this 
paper are equivalent to the best approaches proposed in [B], 
and our experiment results show that the learning-based ap- 


proach can dramatically improve FVP ranking performance 
(Table O. 

2. A LEARNING-BASED APPROACH 

To obtain labeled data for model training, we hire a hu¬ 
man assessor to provide relevant judgments between queries 
and facet-value pairs. As for the learner, we use the Gradi¬ 
ent Boosted Trees (GBT) [^, which is able to handle deep 
interactions among features, and has been shown to perform 
well in many IR tasks such as learning to rank [4]. 

2.1 Features 

We propose a number of features that measure the rel¬ 
evance degree between query and facet-value pair. These 
features can be categorized into seven categories based on 
what type of information is used. Table [T] summarizes all 
the features. 

2.1.1 Query Features 

This type of features only depends on the query. We use 
two features: the query length (number of words), and the 
average IDF of all query words. The average IDF is used to 
measure the uniqueness of a query. 

2.1.2 Facet Features 

This type of features only depend on the facet. The first 
(F.Type) is a categorical feature that identifies the facet (the 
number of unique facets is usually small). The second fea¬ 
ture (F.NumValues) is the number of unique values the facet 
has in the whole corpus. The third feature (F.NumOccrs) 
is the total number of occurrences of all facet-value pairs of 
this facet in the whole corpus. 

2.1.3 Value Features 

This type of features only depend on the value. Two fea¬ 
tures are used: V.Length is the number of words contained 
in the value, and V.AvgIDF is the average IDF of all value 
words. 

2.1.4 FVP Features 

This type of features depend on the facet-value pair. 
P.NumDocs is the number of documents with this facet- 
value pair. P.IDF is the Inverse Document Frequency of 
this FVP. 

2.1.5 Query-Facet Features 

This type of features measure the similarity between the 
query and the facet. QF.TFIDF is the TFIDF score be¬ 
tween the query and the facet. This feature might be useful 
based on the intuition that users might use the facet name to 
express the faceted constraint on returned documents. For 
example, the query “movies directed by James Cameron” is 
related to the facet “director”. 

2.1.6 Query-Value Features 

This type of features measure the similarity between the 
query and the value. We use four features based on four 
traditional IR scoring methods. QV.BM25 and QV.TFIDF 
are the BM25 and TFIDF scores between the query and 
the value. QV.SIDF is different from QV.TFIDF, and ig¬ 
nores the term frequency part. QV.CosSim is the cosine 
similarity, where the query and value vectors are calculated 
using the TFIDF weighting method. Comparing these four 


features, QV.BM25 and QV.CosSim have penalty for long 
values while the other two do not. 

2.1.7 Query-FVP Features 

This type of features depend on both the query and the 
facet-value pair, which are mainly based on the frequency of 
the FVP occurring in the top retrieved documents. QP.DFN 
measures how many documents in the top N retrieved docu¬ 
ments have the FVP. We set N = 10, 100, 1000, and the num¬ 
ber of all retrieved documents respectively. QP.DFIDFN 
is the product of QP.DFN and the IDF of the FVP. This 
group of features might be useful based on the intuition 
that a facet-value pair occurring frequently in the top re¬ 
trieved documents while less frequently in the whole corpus 
are more likely to be relevant to the query. 


Table 1: Features for ranking facet-value pairs 


Type 

ID 

Detail 

Query 

Q.Length 

Number of words in the query 

Q.AvgIDF 

Average IDF of query words 

Facet 

F.Type 

The facet type (categorical) 

F.NumValues 

Number of unique values of this 
facet 

F.NumOccrs 

Number of occurrences of all 
values of this facet 

Value 

V.Length 

Number of words in the value 

V.AvgIDF 

Average IDF of value words 

FVP 

P.NumDocs 

Number of documents contain¬ 
ing this FVP 

P.IDF 

IDF of this FVP 

Query- 

Facet 

QF.TFIDF 

TFIDF score between the query 
and the facet name 

Query- 

Value 

QV.TFIDF 

TFIDF score between the query 
and the value 

QV.BM25 

BM25 score between the query 
and the value 

QV.SIDF 

Sum of IDFs of the overlapped 
words between the query and 
the value 

QV.CosSim 

Cosine similarity between the 
query and the value 

Query- 

FVP 

QP.DFIO 

FVP frequency in the top 10 re¬ 
trieved documents 

QP.DFIDFIO 

QP.DFIO * IDF of the FVP 

QP.DFIOO 

FVP frequency in the top 100 
retrieved documents 

QP.DFIDFIOO 

QP.DFIOO * IDF of the FVP 

QP.DFIOOO 

FVP frequency in the top 1000 
retrieved documents 

QP.DFIDFlOOf 

QP.DFIOOO * IDF of the FVP 

QP.DFAll 

FVP frequency in all retrieved 
documents 

QP.DFIDFAll 

QP.DFAll * IDF of the FVP 


3. EXPERIMENTS 
3.1 Data Set 

We use the data set from the data-centric track of INEX 
2010 [3]. This data set consists of: 1) the IMDB movie 



























Table 2: A query example in the data set 


Title: 

Comedy Woody Allen Scarlett Johansson 

Description 

Comedy movies directed by Woody Allen 
and acted by Scarlett Johansson. 

Narrative: 

I am looking for the comedy movies di¬ 
rected by Woody Allen and acted by Scar¬ 
lett Johansson. 

Relevant 

FVPs: 

Genre: Comedy 

Director: Woody Allen 

Actor: Scarlett Johansson 


collection with 1,594,513 movies; 2) 26 query topics (title, 
description, and narrative) created by the track participants. 

To prepare labeled facet-value pairs for model training, 
we hire a human assessor to provide relevance judgments 
on query-FVP pairs. To obtain FVP candidates, all FVPs 
are ranked based on the BM25 score between the query and 
the value (feature QV.BM25 in Table [T|), and the top 100 
FVPs of each query are kept for relevance judging. As a 
result, we have a total number of 2600 query-FVP relevance 
judgments, among which there are 148 relevant ones. A 
query example and the relevant facet-value pairs labeled by 
the human assessor are shown in Tabled 

3.2 Experiment Settings 

We use the package from [1] for training gradient boosted 
trees. To determine the best number of trees in GBT, we use 
the best MAP instead of the smallest error on the validation 
set. Regarding the parameters of GBT, we use the Bernoulli 
distribution, a maximum of 3000 trees, an interaction depth 
of 5, a minimum number of observations of 10 in each tree 
node, and a shrinkage parameter of 0.01. We use 10-fold 
cross validation to evaluate our approach, and the average 
performance on all folds will be reported. 

We use the BM25 algorithm implemented in Lemur as the 
document retrieval approach throughout our experiments. 
Before scoring a document, we remove all XML tags, and 
treat it as an unstructured document. 

In our approaches, we assume only the title part of each 
query is available, since keywords in the title part are typ¬ 
ical queries in commercial search engines. The query de¬ 
scriptions and narratives are only used to help the human 
assessor make relevance judgments of queries and facet-value 
pairs. 

4. EXPERIMENT RESULTS 

Table [3] shows the ranking performance of each FVP- 
ranking approach. GBT is the learning-based approach that 
uses all features in Table [T] All the other 12 approaches 
are based on individual features. We didn’t use the other 
features because they do not measure the relevance degree 
between query and FVP directly, and thus are not suitable 
for FVP ranking individually. 

Based on Table |31 we have the following findings: 

• GBT dramatically outperforms all individual features. 
This demonstrates the superiority of the learning- 
based approach, and indicates the features we pro¬ 
posed are complementary with each other. 

• QP (Query-FVP) features generally perform better 


Table 3: Performances of different FVP-ranking ap¬ 
proaches. GBT is the learning-based approach, and 
the other approaches use individual features. GBT 
significantly outperforms all the other approaches 
under a paired t-test (p-value < 0.05). 


Approach 

MAP 

R-Prec 

P@5 

P®R=1 

QV.TFIDF 

0.18 

0.09 

0.08 

0.11 

QV.SIDF 

0.18 

0.15 

0.14 

0.10 

QP.DFIO 

0.30 

0.25 

0.22 

0.23 

QP.DFIDFIO 

0.32 

0.28 

0.22 

0.24 

QP.DFAll 

0.33 

0.21 

0.30 

0.21 

QP.DFIDFAll 

0.33 

0.21 

0.30 

0.21 

QV.GosSim 

0.37 

0.24 

0.33 

0.23 

QV.BM25 

0.42 

0.29 

0.34 

0.28 

QP.DFIOOO 

0.48 

0.38 

0.29 

0.36 

QP.DFIDFIOOO 

0.51 

0.43 

0.29 

0.38 

QP.DFIDFIOO 

0.53 

0.43 

0.29 

0.37 

QP.DFIOO 

0.53 

0.44 

0.29 

0.37 

GBT 

0.73 

0.61 

0.45 

0.55 


than QV (Query-Value) features, which means the fre¬ 
quency among top retrieved documents (measured by 
QP features) is a stronger signal than the pure text 
match between the value and the query. 

• Among all QP features, QP.DFIOO and QP.DFIDFIOO 
perform the best, which means 100 might be a reason¬ 
able cutoff on top retrieved documents on the data set 
we use. 

• Among all QV features, QV.BM25 and QV.GosSim 
outperform QV.TFIDF and QV.IDF significantly. 
Note that the major difference between these features 
is that QV.BM25 and QV.GosSim normalize term fre¬ 
quency based on the value length of the FVP while 
the other two do not. Given the dramatically different 
performances, we can conclude that length normaliza¬ 
tion is very important for FVP ranking, and this might 
be generalized to other short-text-ranking problems as 
well. 

The top 10 features with highest relative influences 
output by the GBM package are shown in Figure [2] 

5. CONCLUSIONS 

We study the problem of identifying relevant facet- 
value pairs for keyword-based search queries. This 
task is important because it can be applied in 
many applications, including structured document re¬ 
trieval/filtering [9l[5], interactive retrieval [6l[3, struc¬ 
tured document summarization (snippet generation) 
[8], etc. We proposed a supervised learning approach 
for this problem. The proposed approach is evaluated 
on a movie data set. Experiment results show the pro¬ 
posed approach can identify relevant facet-value pairs 
effectively. 
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